Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise

نویسندگان

  • Qifeng Zhu
  • Abeer Alwan
چکیده

An analysis-based non-linear feature extraction approach is proposed, inspired by a model of how speech amplitude spectra are affected by additive noise. Acoustic features are extracted based on the noiserobust parts of speech spectra without losing discriminative information. Two non-linear processing methods, harmonic demodulation and spectral peak-to-valley ratio locking, are designed to minimize mismatch between clean and noisy speech features. A previously studied method, peak isolation [IEEE Transactions on Speech and Audio Processing 5 (1997) 451], is also discussed with this model. These methods do not require noise estimation and are effective in dealing with both stationary and non-stationary noise. In the presence of additive noise, ASR experiments show that using these techniques in the computation of MFCCs improves recognition performance greatly. For the TI46 isolated digits database, the average recognition rate across several SNRs is improved from 60% (using unmodified MFCCs) to 95% (using the proposed techniques) with additive speech-shaped noise. For the Aurora 2 connected digit-string database, the average recognition rate across different noise types, including non-stationary noise background, and SNRs improves from 58% to 80%. 2003 Elsevier Science Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Robust Speech Recognition

Robust recognition theory has become one of research focuses of acoustic speech recognition. Acoustic speech digital signal is a random process repeatedly alternating stationary pieces with non-stationary pieces. However both the current linear and stationary characteristic parameters drawn from such signals and the rigid recognition models do not adapt to such repeatedly alternating property o...

متن کامل

Robust Speech Recognition Features Based on Temporal Trajectory Filtering and Non-Uniform Spectral Compression

This paper proposes a new feature extraction method based on temporal trajectory filtering and nonuniform spectral compression and examines its performance with two tasks in noisy environments. Temporal trajectory filtering is effective for robust speech recognition in noisy environments, due to human hearing is more sensitive to relative values rather than absolute values and the effect of add...

متن کامل

Speaker feature extraction from pitch information based on spectral subtraction for speaker identification

Robust speaker feature extraction under noise conditions is an important issue for application of a speaker recognition system. It is well known that LPC cepstrum, which expresses the spectral envelope, is e ective for speaker recognition. This implies that the spectral rough structure is e ective for speaker recognition. However, LPC cepstrum is a noise-sensitive feature. On the other hand, sp...

متن کامل

Robust speech enhancement techniques for ASR in non-stationary noise and dynamic environments

In the current ASR systems the presence of competing speakers greatly degrades the recognition performance. This phenomenon is getting even more prominent in the case of hands-free, far-field ASR systems like the “Smart-TV” systems, where reverberation and non-stationary noise pose additional challenges. Furthermore, speakers are, most often, not standing still while speaking. To address these ...

متن کامل

Adaptive Enhancement of Speech Signals for Robust ASR

Behavior of the least squares filter (LeSF) is analyzed for a class of non-stationary signals that are composed of multiple sinusoids whose frequencies and the amplitudes may vary from block to block and which are embedded in white noise. Analytic expressions for the weights and the output of the LeSF are derived as a function of the block length and the signal SNR computed over the correspondi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2003